Parallel and Hierarchical Decision Making for Sparse Coding in Speech Recognition
نویسندگان
چکیده
Sparse coding exhibits promising performance in speech processing, mainly due to the large number of bases that can be used to represent speech signals. However, the high demand for computational power presents a major obstacle in the case of large datasets, as does the difficulty in utilising information scattered sparsely in high dimensional features. This paper reports the use of an online dictionary learning technique, proposed recently by the machine learning community, to learn large scale bases efficiently, and proposes a new parallel and hierarchical architecture to make use of the sparse information in high dimensional features. The approach uses multilayer perceptrons (MLPs) to model sparse feature subspaces and make local decisions accordingly; the latter are integrated by additional MLPs in a hierarchical way for making global decisions. Experiments on the WSJ database show that the proposed approach not only solves the problem of prohibitive computation with large-dimensional sparse features, but also provides better performance in a frame-level phone prediction task.
منابع مشابه
Face Recognition using an Affine Sparse Coding approach
Sparse coding is an unsupervised method which learns a set of over-complete bases to represent data such as image and video. Sparse coding has increasing attraction for image classification applications in recent years. But in the cases where we have some similar images from different classes, such as face recognition applications, different images may be classified into the same class, and hen...
متن کاملTraffic Scene Analysis using Hierarchical Sparse Topical Coding
Analyzing motion patterns in traffic videos can be exploited directly to generate high-level descriptions of the video contents. Such descriptions may further be employed in different traffic applications such as traffic phase detection and abnormal event detection. One of the most recent and successful unsupervised methods for complex traffic scene analysis is based on topic models. In this pa...
متن کاملVoice-based Age and Gender Recognition using Training Generative Sparse Model
Abstract: Gender recognition and age detection are important problems in telephone speech processing to investigate the identity of an individual using voice characteristics. In this paper a new gender and age recognition system is introduced based on generative incoherent models learned using sparse non-negative matrix factorization and atom correction post-processing method. Similar to genera...
متن کاملHierarchical Recognition of Sparse Patterns in Large-scale Simultaneous Inference
We study how to accurately separate signals from noisy data and determine the patterns of the selected signals. Controlling the inflation of false positive errors is an important issue in largescale simultaneous inference but has not been addressed in the pattern recognition literature. We develop a decision-theoretic framework and formulate the sparse pattern recognition problem as a simultane...
متن کاملContinuous speech recognition with sparse coding
Sparse coding is an efficient way of coding information. In a sparse code most of the code elements are zero; very few are active. Sparse codes are intended to correspond to the spike trains with which biological neurons communicate. In this article, we show how sparse codes can be used to do continuous speech recognition. We use the TIDIGITS dataset to illustrate the process. First a waveform ...
متن کامل